110 research outputs found
Expanding a Database of Portuguese Tweets
This paper describes an existing database of geolocated tweets that were produced in Portuguese regions and proposes an approach to further expand it. The existing database covers eight consecutive days of collected tweets, totaling about 300 thousand tweets, produced by about 11 thousand different users. A detailed analysis on the content of the messages suggests a predominance of young authors that use Twitter as a way of reaching their colleagues with their feelings, ideas and comments. In order to further characterize this community of young people, we propose a method for retrieving additional tweets produced by the same set of authors already in the database. Our goal is to further extend the knowledge about each user of this community, making it possible to automatically characterize each user by the content he/she produces, cluster users and open other possibilities in the scope of social analysis
Detection of Emerging Words in Portuguese Tweets
This paper tackles the problem of detecting emerging words on a language, based on social networks content. It proposes an approach for detecting new words on Twitter, and reports the achieved results for a collection of 8 million Portuguese tweets. This study uses geolocated tweets, collected between January 2018 and June 2019, and written in the Portuguese territory. The first six months of the data were used to define an initial vocabulary on known words, and the following 12 months were used for identifying new words, thus testing our approach. The set of resulting words were manually analyzed, revealing a number of distinct events, and suggesting that Twitter may be a valuable resource for researching neology, and the dynamics of a language
Deep Emotion Recognition in Textual Conversations: A Survey
While Emotion Recognition in Conversations (ERC) has seen a tremendous
advancement in the last few years, new applications and implementation
scenarios present novel challenges and opportunities. These range from
leveraging the conversational context, speaker and emotion dynamics modelling,
to interpreting common sense expressions, informal language and sarcasm,
addressing challenges of real time ERC, recognizing emotion causes, different
taxonomies across datasets, multilingual ERC to interpretability. This survey
starts by introducing ERC, elaborating on the challenges and opportunities
pertaining to this task. It proceeds with a description of the emotion
taxonomies and a variety of ERC benchmark datasets employing such taxonomies.
This is followed by descriptions of the most prominent works in ERC with
explanations of the Deep Learning architectures employed. Then, it provides
advisable ERC practices towards better frameworks, elaborating on methods to
deal with subjectivity in annotations and modelling and methods to deal with
the typically unbalanced ERC datasets. Finally, it presents systematic review
tables comparing several works regarding the methods used and their
performance. The survey highlights the advantage of leveraging techniques to
address unbalanced data, the exploration of mixed emotions and the benefits of
incorporating annotation subjectivity in the learning phase
Hydrothermal processing of corn residues:process optimisation and products characterisation
Hydrothermal processing was used as pre-treatment method for the selective solubilisation of hemicellulose from corn residues (leaves and stalks). The raw material was treated at a liquidto- solid ratio of 10 g/g, under non-isothermal conditions (150-240ºC) and the effect of treatment on the composition of both liquid and solid phases was evaluated. The yields of solid residue and soluble products, e.g., oligosaccharides, monosaccharides, acetic acid and degradation compounds, such as furfural, hydroxymethylfurfural are presented and interpreted using the severity factor (log R0). The operational conditions leading to the maximum recovery of XOS (53% of initial (arabino)xylan) and for highest glucan content of the solid residue (64%) were established for log R0 of 3.75 and 4.21, respectively. Under the severest condition 95% of xylan was selectively solubilised and 90% of initial glucan was recovered on the solid residue, making it very attractive for further processing in a biorefinery framework
Disfluency Detection Across Domains
This paper focuses on disfluency detection across distinct domains using a large set of openSMILE features, derived from the Interspeech 2013 Paralinguistic challenge. Amongst different machine learning methods being applied, SVMs achieved the best performance. Feature selection experiments revealed that the dimensionality of the larger set of features can be further reduced at the cost of a small degradation. Different models trained with one corpus were tested on the other corpus, revealing that models can be quite robust across corpora for this task, despite their distinct nature. We have conducted additional experiments aiming at disfluency prediction in the context of IVR systems, and results reveal that there is no substantial degradation on the performance, encouraging the use of the models in IVR domains.info:eu-repo/semantics/publishedVersio
Special Issue "Selected Papers from PROPOR 2020"
PROPOR 2020 will be the 14th edition of the biennial PROPOR conference, hosted alternately in Brazil and in Portugal. Past meetings were held in Lisbon, PT (1993); Curitiba, BR (1996); Porto Alegre, BR (1998); Évora, PT (1999); Atibaia, BR (2000); Faro, PT (2003); Itatiaia, BR (2006); Aveiro, PT (2008); Porto Alegre, BR (2010); Coimbra, PT (2012); São Carlos, BR (2014), Tomar, PT (2016), and Canela, BR (2018). More detailed information: https://propor.di.uevora.pt/.
The authors of a number of selected full papers of high quality will be invited after the conference to submit revised and extended versions of their originally-accepted conference papers to this Special Issue of Information, published by MDPI, in open access. The selection of these best papers will be based on their ratings in the conference review process, quality of presentation during the conference, and expected impact on the research community. Each submission to this Special Issue should contain at least 50% of new material, e.g., in the form of technical extensions, more in-depth evaluations, or additional use cases and a change of title, abstract, and keywords. These extended submissions will undergo a peer-review process according to the journal’s rules of action. At least two technical committees will act as reviewers for each extended article submitted to this Special Issue; if needed, additional external reviewers will be invited to guarantee a high-quality reviewing process.FCT CEECIND/01997/2017, UIDB/00057/202
Context-Dependent Embedding Utterance Representations for Emotion Recognition in Conversations
Emotion Recognition in Conversations (ERC) has been gaining increasing
importance as conversational agents become more and more common. Recognizing
emotions is key for effective communication, being a crucial component in the
development of effective and empathetic conversational agents. Knowledge and
understanding of the conversational context are extremely valuable for
identifying the emotions of the interlocutor. We thus approach Emotion
Recognition in Conversations leveraging the conversational context, i.e.,
taking into attention previous conversational turns. The usual approach to
model the conversational context has been to produce context-independent
representations of each utterance and subsequently perform contextual modeling
of these. Here we propose context-dependent embedding representations of each
utterance by leveraging the contextual representational power of pre-trained
transformer language models. In our approach, we feed the conversational
context appended to the utterance to be classified as input to the RoBERTa
encoder, to which we append a simple classification module, thus discarding the
need to deal with context after obtaining the embeddings since these constitute
already an efficient representation of such context. We also investigate how
the number of introduced conversational turns influences our model performance.
The effectiveness of our approach is validated on the open-domain DailyDialog
dataset and on the task-oriented EmoWOZ dataset.Comment: WASSA'2
Teenage and Adult Speech in School Context: Building and Processing a Corpus of European Portuguese
We present a corpus of European Portuguese spoken by teenagers and adults in school context, CPE-FACES, with an overview of the differential characteristics of high school oral presentations and the challenges this data poses to automatic speech processing. The CPE-FACES corpus has been created with two main goals: to provide a resource for the study of prosodic patterns in both spontaneous and prepared unscripted speech, and to capture inter-speaker and speaking style variations common at school, for research on oral presentations. Research on speaking styles is still largely based on adult speech. References to teenagers are sparse and cross-analyses of speech types comparing teenagers and adults are rare. We expect CPE-FACES, currently a unique resource in this domain, will contribute to filling this gap in European Portuguese. Focusing on disfluencies and phrase-final phonetic-phonological processes we show the impact of teenage speech on the automatic segmentation of oral presentations. Analyzing fluent final intonation contours in declarative utterances, we also show that communicative situation specificities, speaker status and cross gender differences are key factors in speaking style variation at school.info:eu-repo/semantics/publishedVersio
- …